- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources5
- Resource Type
-
0005000000000000
- More
- Availability
-
50
- Author / Contributor
- Filter by Author / Creator
-
-
Xie, Tengyang (5)
-
Cheng, Ching-An (2)
-
Jiang, Nan (2)
-
Agarwal, Alekh (1)
-
Cai, Mu (1)
-
Chow, Yinlam (1)
-
Ghavamzadeh, Mohammad (1)
-
Lee, Yong Jae (1)
-
Liu, Bo (1)
-
Lyu, Daoming (1)
-
Mineiro, Paul Agarwal (1)
-
Wang, Haoxiang (1)
-
Xiong, Wei (1)
-
Xu, Yangyang (1)
-
Yoon, Daesub (1)
-
Zhang, Jianrui (1)
-
Zhang, Tong (1)
-
Zhao, Han (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Wang, Haoxiang; Xiong, Wei; Xie, Tengyang; Zhao, Han; Zhang, Tong (, Association for Computational Linguistics)
-
Cheng, Ching-An; Xie, Tengyang; Jiang, Nan; Agarwal, Alekh (, Proceedings of the 39th International Conference on Machine Learning)We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player Stackelberg game: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters that control the degree of pessimism, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.more » « less
-
Xie, Tengyang; Cheng, Ching-An; Jiang, Nan; Mineiro, Paul Agarwal (, Advances in neural information processing systems (selected for oral presentation))
-
Xie, Tengyang; Liu, Bo; Xu, Yangyang; Ghavamzadeh, Mohammad; Chow, Yinlam; Lyu, Daoming; Yoon, Daesub (, Advances in neural information processing systems)
An official website of the United States government

Full Text Available